Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences

نویسندگان

چکیده

Reward functions are a common way to specify the objective of robot. As designing reward can be extremely challenging, more promising approach is directly learn from human teachers. Importantly, data teachers collected either passively or actively in variety forms: passive sources include demonstrations (e.g., kinesthetic guidance), whereas preferences comparative rankings) elicited. Prior research has independently applied learning these different sources. However, there exist many domains where multiple complementary and expressive. Motivated by this general problem, we present framework integrate information, which users. In particular, an algorithm that first utilizes user initialize belief about function, then probes with preference queries zero-in on their true reward. This not only enables us combine sources, but it also informs robot when should leverage each type information. Further, our accounts for human’s ability provide data: yielding user-friendly theoretically optimal. Our extensive simulated experiments studies Fetch mobile manipulator demonstrate superiority usability integrated framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clonal Relatedness of Enterotoxigenic and Enteropathogenic Escherichia coli Isolates from Diverse Human, Foods and Calf Sources

Background: Foodborne infection caused by EnterotoxigenicEscherichia coli (ETEC) and Enteropathogenic Escherichia coli (EPEC) is one of the major health problems, particularly in the developing countries. Therefore, it is vital to identify the origin of food contamination to plan control strategies efficiently. Method: A tota...

متن کامل

investigating the effect of motivation and attitude towards learning english, learning style preferences and gender on iranian efl learners proficiency

تحقیق حاضر به منظور بررسی تاثیر انگیزه و نگرش نسبت به یادگیری زبان انگلیسی، ترجیحات سبک یادگیری و جنسیت بر بسندگی فراگیران ایرانی زبان انگلیسی انجام شد. برای این منظور، 154 فراگیر ایرانی زبان انگلیسی در این تحقیق شرکت کردند. سه ابزار جمع آوری داده ها شامل آزمون تعیین سطح بسندگی زبان انگلیسی آکسفورد، پرسشنامه ترجیحات سبک یادگیری براچ و پرسشنامه انگیزه و نگرش نسبت به یادگیری زبان انگلیسی به م...

Learning Skills from Human Demonstrations

Many robots are designed for use in domestic environments where robots will be engaged in household chores. The robots need to learn ways to do the household chores that humans are now doing. We are taking a learning from demonstration (LfD) approach to this problem [1]. In terms of the household chores, a number of tasks are developed so far; for example, bringing a beer bottle from a refriger...

متن کامل

Integrating reinforcement learning with human demonstrations of varying ability

This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning sig...

متن کامل

Learning from Demonstrations: Is It Worth Estimating a Reward Function?

This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL). IRL and AL are two frameworks, using Markov Decision Processes (MDP), which are used for the imitation learning problem where an agent tries to learn from demonstrations of an expert. In the AL framework, the agent tries to learn the expert policy whereas in the IRL framework, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The International Journal of Robotics Research

سال: 2021

ISSN: ['1741-3176', '0278-3649']

DOI: https://doi.org/10.1177/02783649211041652